Regenerating Hypotheses for Statistical Machine Translation

نویسندگان

  • Boxing Chen
  • Min Zhang
  • AiTi Aw
  • Haizhou Li
چکیده

This paper studies three techniques that improve the quality of N-best hypotheses through additional regeneration process. Unlike the multi-system consensus approach where multiple translation systems are used, our improvement is achieved through the expansion of the Nbest hypotheses from a single system. We explore three different methods to implement the regeneration process: redecoding, n-gram expansion, and confusion network-based regeneration. Experiments on Chinese-to-English NIST and IWSLT tasks show that all three methods obtain consistent improvements. Moreover, the combination of the three strategies achieves further improvements and outperforms the baseline by 0.81 BLEU-score on IWSLT’06, 0.57 on NIST’03, 0.61 on NIST’05 test set respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Machine Learning Approach to Hypotheses Selection of Greedy Decoding for SMT

This paper proposes a method for integrating example-based and rule-based machine translation systems with statistical methods. It extends a greedy decoder for statistical machine translation (SMT), which searches for an optimal translation by using SMT models starting from a decoder seed, i.e., the source language input paired with an initial translation hypothesis. In order to reduce local op...

متن کامل

Lexical Micro-adaptation in Statistical Machine Translation

We introduce a generic framework in Statistical Machine Translation (SMT) in which lexical hypotheses, in the form of a target language model local to the input sentence, are used to guide the search for the best translation, thus performing a lexical microadaptation. An instantiation of this framework is presented and evaluated on three language pairs, where these auxiliary hypotheses are deri...

متن کامل

Hybrid Decoding: Decoding with Partial Hypotheses Combination over Multiple SMT Systems

In this paper, we present hybrid decoding — a novel statistical machine translation (SMT) decoding paradigm using multiple SMT systems. In our work, in addition to component SMT systems, system combination method is also employed in generating partial translation hypotheses throughout the decoding process, in which smaller hypotheses generated by each component decoder and hypotheses combinatio...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Computing multiple weighted reordering hypotheses for a statistical machine translation phrase-based systems

Reordering is one source of error in statistical machine translation (SMT). This paper extends the study of the statistical machine reordering (SMR) approach, which uses the powerful techniques of the SMT systems to solve reordering problems. Here, the novelties yield in: (1) using the SMR approach in a SMT phrase-based system, (2) adding a feature function in the SMR step, and (3) analyzing th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008